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Abstract. We are interested in the problem of transition reduction of non- 
deterministic automata. We present some results on the reduction of the 
automata recognizing the language L(E n ) denoted by the regular expres- 
sion E n — (1 + e) • (2 + e) • (3 + e) • • • (n + e). These results can be used 
in the general case of the transition reduction problem. 



1 Introduction 

Minimizing the number of states of an automaton is a subject that has been 
studied extensively since the 1950s, both in the deterministic case and the non- 
deterministic case 111171 . However, works on the minimization of the number 
of transitions have appeared recently. 

In 1997, J. Hromkovic et ah [8] have proposed an algorithm based on the con- 
cept of Common Follow Set of a regular expression, that converts a regular ex- 
pression of size n into a finite state automaton with 0(n) states, 0(n log n) tran- 
sitions as lower bound and 0(n log 2 n) transitions as upper bound. Muscholl 
et al. [6j, showed that this algorithm can be implemented in time (3(nlog 2 n). 
In (9] Ouardi and Ziadi, based on the ZPC structure [2j, gave an 0(n log 2 n) 
algorithm to convert a weighted regular expression of size n into a weighted 
automaton having 0(n) states and 0(n log 2 n) transitions. In [5], Viliam Geffert 
showed that every regular expression of size n over a fixed alphabet of s sym- 
bols can be converted into a nondeterministic e-free finite state automaton with 
0(sn log n) transitions. 

Lower bound was improved by Yuri Lifshits |15] to i?( i" g °o g n n ), after, Schnit- 
ger [13] improved it to !?(nlog 2 n) transitions. 

In (3)/ R- Cox has done an exhaustive search to find the transition minimal 
automata of L(E n ) for n = 1 to 7. He has also used an heuristic approach that 
construct transition reduced automata for n = 8 to 10. 

Here, we are able to produce an algorithm for which the number of tran- 
sitions is minimal for L(E n ) languages class, in the sense that, asymptotically, 
this number of transitions is equivalent to n log 2 n (see Section^. 

We mention that most of complexity results mentioned above are obtained 
from the study of L{E n ) languages class. This class of languages corresponds 
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to a simple class of automata, in which, the minimization of the number of 
transitions is difficult and not obvious. The study of this class of languages, can 
also find its application in bioinformatics, since that L(E n ) is exactly the set of 
all sub-sequences of the word 1.2.3 . . . n. 

Our approach to reduce the number of transitions of a nondeterministic ho- 
mogeneous finite state automaton is based on the decomposition of the tran- 
sition table of the automaton into blocks. This decomposition is based on the 
concept of Common Follow Sets. From a block decomposition we construct an 
automaton with less transitions than the initial automaton. See Figured] 

The main problem in our approach is to find a good block decomposition of 
the transition table (even the best one). In the case where this matrix is lower 
triangular or upper triangular, finding a minimal decomposition block leading 
to a minimal transition automaton, is not evident. Our study is focused on the 
upper triangular matrix, which corresponds to the transition table of the deter- 
ministic minimal automaton recognizing the L{E n ) language. The case of lower 
triangular matrix can be obtained in a similar manner. 
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Fig. 1. The reduced automaton (at right) is obtained from a decomposition transition 
table (at right bottom) of the homogeneous automaton (at left). The automaton A(Ls) is 
the part of the reduced automaton which represents the triangle (in the transition table 
decomposition). 

In this paper we present the following results: At first, in Section [3j we ex- 
tend the concept of Common Follow Sets to homogeneous automata. Then, in 
Section [4] we introduce particular decompositions called Z-partitions associ- 
ated with expressions E n . Then, in Section [5] we introduce the notion of Z-iree 
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to represent any Z-partition by a binary tree. Then, we propose an algorithm of 
0(n log n) time complexity to generate the Z-minimal trees. We finish our study 
by experimental results and a last section in which we show that our algorithms 
construct automata with a number of transitions equivalents to n log 2 (n) which 
is the minimal lower bound according to Schnitger. 



2 Notation and terminology 

We recall the basics of regular expressions, languages and finite state machines 
and introduce the notation that we use. Let S be a non-empty finite set of sym- 
bols, called alphabet. The set of all the words over E is denoted by E* . The 
empty word is denoted by e. A language over E is a subset of E* . A finite au- 
tomaton over E is a 5-tuple A = (Q, E, 1, 5, F) where Q is a set of states, I is 
a subset of Q whose elements are the initial states, F is a subset of Q whose 
elements are the final states, S is a subset of the cartesian product Q x E x Q 
whose elements are the transitions. A transition (q, a,p) £ S goes from the head 
q to the tail p. A path in A is a sequence of transitions (<&, a,;, qi+i), i = 1 to n, 
of consecutive transitions. Its label is the word w = aia 2 ■ ■ ■ a n . A word w € E* 
is recognized by the automaton A if there is a path with label w such that 
qi G / and q n +i € F. The language recognized by the automaton A is the set 
of words that are recognized by A. The automaton A is homogeneous if for all 
(q, a,p), (q' , a' ,p') € 5,p = p' implies that a = a', in this case we write h(p) = a. 
The function h assigns to each non-initial state q of an homogeneous automaton 
the symbol that is the unique label of all the transitions having q as tail. 
In Appendix[A]we recall the basics of asymptotic notations. 



3 CFS for homogeneous automata 

J. Hromkovic et al. [8\ have given an elegant algorithm based on the notion of 
Common Follow Sets, to convert a regular expression of size n into a nondeter- 
ministic finite automaton having O(n) states and 0(n log 2 n) transitions. This 
notion can be easily extended to homogeneous automata. 

Let A = (Q, E, {qo}, S, F) be an homogeneous automaton. In order to cap- 
ture the final states in the A, we introduce a dummy state denoted by # which 
is not in Q. We define over Q the function follow as follows: 



follow(q) 



{p | (q,a,p) E 6} U{#} if qeF, 
{p | (q,a,p) £ 5} otherwise. 



Let q e Q be a state in A, we denote by dec(q) = {Q±, Q 2 , • • • , Qk] (where Qi C 
follow(q)) any decomposition of the set follow(q), i.e. follow(q) = Qi. 

Qi^dec(q) 

In the case where dec(q) is a partition of the set follow(q), the decomposition 
dec(q) will be called a partition decomposition. Figure |2] provides examples of 
decompositions. 



Fig. 2. We have follow(0) — {1,2,3, #}. Here are three possible decompositions of 
follow (Q). The two first ones are partition decompositions, (i) dec(O) = {{1, 2}, {3, #}} 
(ii) dec(O) = {{1}, {2}, {3, #}} (iii) dec(O) = {{1, 2}, {2, 3, #}} . 

Definition 1 (Common Follow Sets System). Let Abe a homogeneous automaton. 
A CFS system for A is given as S(A) — (dec(q)) qe Q, where each dec(q) C 2 Q z's a 
decomposition of follow (q). 

Definition 2 (CFS automaton). Let A = (Q, S, {qo}, <>, F) be a homogeneous au- 
tomaton and S(A) an associated Common Follow Sets system. The Common Follow 
Sets automaton associated with S(A) is defined by Cs(a) = (Q\ ^ I', 8', F') where 

- Q' = |J dec(q) 

- V = dec(q ) 

- For Qi e Q', Qx e F' if and only if#eQi 

- 5' = {{Q\, a, Q 2 ) | 3qeQi s.t. h(q) = a and Q 2 & dec(q)}. 

Theorem 1. Let A be a homogeneous automaton, S(A) be a Common Follow Sets 
System associated with A and CgiM its Common Follow Sets automaton. Then C$(a) 
and A recognize the same language. 

This theorem can be proved in the same way as Theorem 5 of the paper of 
J. Hromkovic et al. (8). 

To evaluate the number of transitions in the automaton C$(A) we define over 
the states of A two functions, 

- a(q) — \dec(q)\ the size of the decomposition of the set follow(q) 

- b(q) = \{Qi G Q 1 | q G Qi}\ the number of states in Q' that contain the 
state q. 

Lemma 1. The number of transitions Tq in C S (a) * s sucn that Tq < a(q)b(q). 

It is easy to see that if for all p, q £ Q\{qo} such that p ^ q, we have h(p) ^ h(q) 
then the equality holds. From Lemma [TJ we can see that the number of tran- 
sitions in a CFS automaton depends on the decomposition system. A decom- 
position which is not a partition will induce more transitions than a partition 
decomposition. Therefore in the following we are interested only in partition 
decompositions. As it was mentioned in the introduction our study will focus 
on the CFS automata associated with the family of automata (A n ) n >i- The au- 
tomaton A n — (Q, S, 1, 6, F) is defined by: 



- S = {l,2,...,n] 
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- Q = £U{0} 

- F = Q 

- I = {0},5 = {(p,q,q)eQxZxQ I q>p}. 

Figure H] shows two CFS automata associated with the automaton A3. 

In the next sections we present two algorithms that construct particular CFS 
systems which correspond to CFS automata with a reduced number of transi- 
tions. In the last section we give comparative and experimental results. 
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Fig. 3. Two CFS Automata constructed from the automaton As shown in FigureE] 



4 An Reduction Algorithm 

The notion of Z-partition is nowhere introduced formally. In the following we 
are interested in reducing the number of transitions in the automaton A n . The 
following algorithm computes particular CFS systems S(A n ) that provide Cs(A n ) 
automata with small number of transitions and having n + 1 states. Let E n = 
(1 + e) ■ (2 + e) • (3 + e) • • • (n + e) be a regular expression, it is easy to see that 
the language denoted by the expression E n is exactly the language recognized 
by the automaton A n . We have: 

Proposition 1. Each transition minimal automaton that recognizes L(E n ) has ex- 
actly n + 1 states. 

This proposition can be proved using properties of the universal automaton [10] 

oiL(E n ). 

For fixed n, a CFS system produced by the following algorithm will be de- 
noted by Z(A n ). The set of all Z(A n ) will be denoted CFSZ(n). Our aim in this 
section is to compute all minimal decompositions in CFSZ(n). 
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Algorithm 1 CFSPartitions(n) 



Require: n S N 
Ensure: Z(A n ) 
I: Q «- {0, 1, 2, 3, 4, ... , n} 
for i = to n do 

/o«ou;(i) <- {J 6 Ob' > u {#} 
dec(i) (/> 



2 

3 
4 
5 
6 
7: 
8 
9 

10 
11 
12 
13 
14 
15 
16 
17 



end for 

for i = to n do 

Choose j in Q 

Q <- Q\{i} 

Q-i «- follow(j) 
for all fc G Q do 

if (Qj C follow(k)) then 
dec(fc) «- dec(fc) U {Qj} 
follow(k) follow(k)\Qj 
end if 
end for 
end for 



Proposition 2. X/ze number of all Z(A n ) CFS partition systems is the n th Catalan 

n+l 1 



number: |CF5Z(n)| - 1 < 2n 



The successive choice of values of j (line 8) leads to a permutation of size n. So, 
each CFS partition system Z(A n ) can be associated with at least one permuta- 
tion of size n. 



The following algorithm is a recursive version of Algorithm [T] Its first call 
is done by RecursiveDecomposition(0, n). Without loss of generality we asso- 
ciate in this last algorithm the dummy state # to the number n + 1. At each 
call, Algorithm [2] constructs one block from the transition matrix M, for the 
call RecursiveDecomposition(ni , rii) and the choice of j (line 2), it produces the 
block Bj which is the submatrix M[j..ni; j..n 2 ]- 
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Algorithm 2 RecursiveDecomposition (m, n^) 
Require: ni,ii2 £ N 

Ensure: Z(A n ) when n\ = and ri2 = n 



1 


if n\ < ri2 then 




2 


Choose an integer j between rii and ri2, j 6 {ni , . 


. ,n 2 } 


3 


Qj -S-{j+l,...,?X 2 + l} 




4 


for fc = ni to j do 




5 


dec(fc) «- dec(fe) U {Qj} 




6 


end for 




7 


= M\j..m;j..n 2 ] 




8 


RecursiveDecomposition(ni, j — 1) 




9 


RecursiveDecomposition(j + 1, rii) 




10 


end if 





Example 1. In this example we shows the CFS partition systems associated with 
permutations (0, 2, 1, 3) and permutation (1, 0, 2, 3). 



dec(0) = {{1,2 , 3,#}}0 

<fec(l) = { { 2 } , { 3 , # } } 1 
dec(2) = { { 3 , # } } 2 

<fec(3) = { { # } } 3 



|123# 



3# 
3# 



dec(0) = {{l} / {2,3,#}} 

dec(l) = { { 2 , 3 , # } } 1 

(tec(2) = { { 3 , # } } 2 

dec{3) = { { # } } 3 



23# 
23# 



1 


2 


3 


# 



B 2 is 



„ , the third block B\ is [2] and i?3 = # is the last one. 
3 # — 1 



Proposition 3. The computation of all minimal partition system Z(A n ) in CFSZ(n) 
can be done in time 0{n\). 

Proof. This can be done by calling the nondeterministic Algorithm [T] or [2] for 
each possible execution. 

Remark 1. By the use of the dynamic programming, we can improve the expo- 
nential brute force method to a polynomial algorithm as shown in Algorithm ??. 



In the following sections we will introduce our second algorithm which is based 
on trees. It computes efficiently the reduced Z(A n ) systems. 

5 Tree based reduction 

A binary tree is a structure defined on a finite set of nodes that either contains 
no nodes, or is made of three disjoint sets of nodes: 

- a root node 

- a binary tree called its left subtree 

- a binary tree called its right subtree. 
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The binary tree that contains no nodes is called the empty tree. If the left sub- 
tree is non-empty its root is called the left child of the root of the entire tree. 
Likewise, the root of a non-empty right subtree is the right child of the root of 
the entire tree. Therefore, in a full binary tree each node is either a leaf or has 
degree exactly 2, there is no degree-1 nodes. In the following we call a ri-tree a 
full binary tree with n leaves. There is a unique ?i-tree for n = to 2. 

Let t be a n-tree and let tt be a path in t. The left weight (resp. right weight) 
(resp. b-n) is defined as the number of left (resp. right) edges in the path tt. 
The length of tt denoted by l„ = a v + b n is the length of the path tt. Denote by 
w % = a^b-jr the weight of tt. The cost of tt is the sum of its weight and its length. 
So we have c n = + l^. 

Let v be a node in t. Denote by tt v the path from the node v to the root of t. 
Denote by vi (resp. v r ) the left child of v (resp. the right child of v). Denote by 
/„ the fatheiQ of v. If tt is a path from the node v to the root of t then we denote 
by f„ the path from the node /„ to the root of t. We also associate a n „, b nu , w nu , 
l Vu and c- Ku to the node v and we denote them respectively by a u , b v , w v , l v and 
c u . The set of leaves of a tree t will be denoted by L t . The weight w(t) of the tree 
t is defined as the sum of the weight of its leaves, that is w(t) = w u . 

Proposition 4. Each Z(A n ) partition system corresponds to a unique n-tree. 

Proof. The idea is that if we follow the execution trace of the recursive Algo- 
rithmf2]we can see that it corresponds to a binary tree whose weight is the num- 
ber of transitions of the reduced automaton. And by induction we can prove 
that for each state q we have a Vq = a(q) and b„ q = b(q). See FigureS] 
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Fig. 4. A full binary 5-tree and an associated Z(As) partition (left edges are represented 
by dotted lines and right edges with solid lines). 



So, finding a minimal Z(A n ) partition system is reduced to finding a ri-tree 
having minimal weight. Let us denote it by Z-tree of rank n. 

Let Split(t) be the function that returns the tree obtained from t by replacing 
a leaf having minimal cost in t by the unique 2-tree. See Figure|5] 

Proposition 5. The set of Z -trees can be computed inductively as follows: 



4 The first ancestor. 
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- 1-treeistheZ-treeofrankone 

- ift is a Z-tree (of rank i) then Split(t) is a Z-tree (of rank i + 1). 

Proof. Let t n be a Z-tree of rank n. To get a tree t n+ i of rank n + 1 from t n we 
have to split a leaf fx. The weight of t n+ i is: 

M>(*n+i) = w v 

= ( X/ ~ W M + w left-child^i) + W r i ght ^ child(n) 

= w(t n ) - a^b^ + (a p + 1)6^ + a^b^ + 1) 
= w(t n ) + c p 

If /x is the leaf of t n which have the minimal coast, then, the tree t n +i will have 
minimal weight. 

So, this inductive construction allows us to have the minimal weight tree. The 
difference of weights between two consecutive minimal trees is exactly the cost 
of the split leaf. All Z-trees of rank less than n, can be generated by the follow- 
ing Algorithm [3] 



Algorithm 3 MinZtree (n) 
Require: n £ N 

Ensure: Z-tree of rank less than n 
1: t <- 1-tree 
2: for i = 1 to n do 
3: t 4- Split(t) 
4: end for 



Theorem 2. Algorithm\5\computes one Z-tree of rank ifor alii = 1 to n in 0(n log n) 
time. 

Proof. At each step of this algorithm we look for a minimal cost leaf and then 
we split it. We can maintain the costs of the leaves in a dynamic structure which 
allow us a logarithmic time search for the minimal cost leaf and also a logarith- 
mic time insertion of the two leaves obtained from the split function. 

It is clear that for a given n, there may exist several Z-trees of rank n. 

In the following we introduce a subclass of full binary trees (called P-trees), 
for which the Z-trees are unique. We do that in order to study the size-complexity 
of the reduced automata (the number of transitions). 
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5.1 P-Trees 

We denote by M t the set of leaves having minimal cost in t, that is: M t = 

arg min c„. The function Split All(t) returns the tree obtained from t by replac- 

ve.Lt 

ing every leaf in M t by the unique 2-tree. See Figure [5] 



t SplitAll(i) 




Fig. 5. A 8-tree t with a Split tree and its Split All tree. Values in the nodes are costs. 



Definition 3. The class (t n ) n>0 of P -trees is defined inductively as follows: 

- t\ = 1-treeand 

- t(n+l) = SplitAll(t n ). 

See Figure® 

Remark 2. Notice that if v € M tn then c„ = (n — 1). This can be established by 
induction on n. Therefore, to get £( n +i) from t n we split leaves of cost (n — 1). 




Fig. 6. The first four P-trees. Leaves with big circle have minimal cost. 
5.2 Eratosthenes-Pascal's Triangle 

The Eratosthenes-Pascal's Triangle is constructed from Pascal's Triangle as fol- 
lows: we interleave each column k of Pascal's Triangle with (k — 1) zeros. The 
element of the Eratosthenes-Pascal's Triangle at the n th row and the k th column 
is denoted by T% with n > 1 and k > 1. 
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Let n be a natural number. We denote by D n the set of divisors of n. 
Proposition 6. 




ifk e D„ 
otherwise. 



Proof. The element of the Pascal's Triangle at the n th row and the j th column 
is (jZ\) ■ This element is moved in the Eratosthenes-Pascal's Triangle to the row 
r = (i — j + l)j in the same column j. We have then 



Thus Tl=\3 



+ 3-2 

i-i 



Let S n be the sum of the n th row in the Eratosthenes-Pascal's Triangle. 

n 



fe=i 



E 

feez>„ 



+ fc-2 
k- 1 



5.3 P-trees and Eratosthenes-Pascal's Triangle 

In this section we will describe the link between the Eratosthenes-Pascal's Tri- 
angle and the set of P-trees. 

Theorem 3. The sum of the elements of the n th row of Eratosthenes-Pascal's Trian- 
gle's, S n , is exactly \M tn \ the number of leaves of minimal cost in the P-tree t n . 

To prove this theorem we introduce a family F n which is in bijection with both 
the Eratosthenes-Pascal's Triangle's rows and with P-trees. We focus on the 
following question: Given a natural number n what are all the possible paths that 
have (n - 1) as cost? To answer this question, we define ^(n-i) as the set of all 
paths of cost (n — 1): 



^(n-l) = I 71 " I Cn = (n- 1)} 
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Lemma 2. For all n > 1, |F( n _i)| = S n . 

Proof. Let F£ n _^ f or < i < (n — 1) be the set of paths of -F(n-i) having i left 



(n-l) 



edges. We have f(n-i) = [J F(n-i) where 



i=0 



^(n-i) = {tt e ^(n-l) I a n = i} 

= {ir I (a n bn + + b 7t = {n- 1)) A (a* = i)} 

ft 

= {tt I (bir = —— - 1) A (a* = i)} 



So, we get |F* | = < 



i + 1 



+ i- 1 







if (i + 1) e D n 
otherwise. 



This corresponds to the different ways to arrange i left edges in a path of length 

72 

— Let k = i + 1 then 

i + 1 



1^)1 = < 



+ k-2 
k-1 



if fc e D n 
otherwise. 



Thus l^tln I = That is for < % < n we get l-Fln | = T l n +1 . Finally 



(n-l) 



(n-l) 



\F, 



(n-l) 



= E i4.-dI= E^ +1 = E r * 



n — S n 



i=0 



i=l 



Therefore, we can associate the ra row of the Eratosthenes-Pascal's Triangle to 

^(Tl-l). 



(a) 



(b) 





Fig. 7. All paths of cost 5. (a) The unique path of cost 5 with no left edges which corre- 
sponds to (q) = 1. (b) The three paths of cost 5 with one left edges which corresponds to 
(J) = 3. (c) The three paths of cost 5 with two left edges which corresponds to (*) = 3. 
(d) The unique path of cost 5 with five left edges which corresponds to u) = 1. There is 
no paths of cost 5 with three or four left edges that correspond to zeros in the 6 th row of 
the Eratosthenes-Pascal's Triangle. 
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Lemma 3. For all n > 1, |P(„_i)| = |M t J. 

Proof. We will show that each set F n is associated with the P-tree t n . We pro- 
ceed by induction. Assume that for all k < n, the P-tree t n contains all paths 
of cost less or equal to n, that is, for all k < n, Fk is in the P-tree t„ . From this 
hypothesis we should show that i( n +i) — SplitAll(t n ) contains all the paths of 
cost less or equal to (n + 1). Of course, t( n +i) contains all the paths of cost less 
or equal to n because the tree t( n +i) is obtained from t n . However, does ti n +x) 
contain all paths of cost equal to (n + 1)? To answer this question, we proceed by 
absurd. Let tt be a path with cost = (n + 1) which is not in t( n +i). It is clear 
that the cost of the father of tt is: 

J (a w — l)b n + (a w — 1) + if n is a left child, 
C ^ \ a-irib-K — 1) + 0"k + — 1) if 7T is a right child. 

As Cf^ < c,,- then c/ x < n. From this we deduce that when Cf v < n, the father 
of 7r was split by the Split All() function within an earlier or it will be split in 
the current tree t n in the case where c/„ = n (see Remark|2). In both cases, the 
path 7r is necessarily in tr n +\)- 

From the two last lemmas, we construct a bijection between the P-trees and the 
Eratosthenes-Pascal's Triangle's rows, and we claim following corollary: 

Corollary 1. Let t n be a P-tree. Then |M t J = |P(„_!)| = S n . 
From Corollary [TJ we have S n = 

k\n 

Proposition 7. Let t n be a P-tree. Its size s{t n ) = \Lt n \ (the number of leaves), and 

n— 1 n 

its weight w(t n ) are: s(t n ) = 1 + Si and w(t n ) = — 2)S%-\. 

i=l i=2 

Proof. From Corollary [T] and Remark [2j we have, the size of the tree t„ is the 
sum of the size of the tree f n _i and the number of all split minimal leaves, that 
is, s(t n ) = s(t n _i) + \Mf n _ 1 \ . We have also, the weight of the tree t n is the sum 
of the weight of the tree t„_i and the costs of all split minimal leaves, that is, 

u>(t„)=«>(<ft-i) + (n-2)|M tn _ 1 |. 

From Proposition!!! the P-tree t n (which is a Z-tree) corresponds to a Z(A s (t n )) 
partition system. The CFS automaton associated with Z(A s {t n )) has w(t n ) tran- 
sitions. 

With our construction the CFS automaton associated with a Z(A n ) CFS par- 
tition system may contain several initial states. However, in order to compare 
the number of minimal automata and their number of transitions, with those 
obtained by R. Cox [3] (seen Table [TJ, we must restrict our study to CFS parti- 
tion systems leading to a unique initial state automata. It is easy to verify that 

n— 1 q n .g 

in this case a P-tree t„ has s(t n ) = 1 + ^ ~~2~ an< ^ w ftn) = — ■ 

i=l i=i 
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n 


1 


2 


3 


4 


5 


6 


7 


8 


9 


10 


11 


12 


13 


14 


(i) 


1 


3 


6 


9 


13 


18 


23 


< 28 


< 34 


< 41 


? 


? 


? 


? 


(ii) 


1 


1 


2 


1 


1 


4 


6 


> 1 


> 1 


> 1 


? 


? 


? 


? 


(iii) 


1 


3 


6 


9 


13 


18 


23 


28 


33 


39 


46 


53 


60 


67 


(iv) 


1 


1 


2 


1 


1 


4 


6 


4 


1 


1 


5 


10 


10 


5 



Table 1. Comparison table, (i) Minimal transition number estimated by R. Cox |3 
(ii) Number of minimal automata estimated by R. Cox |3] (iii) Number of transitions in 
our reduced automaton (iv) Number of reduced automata estimated by our approach. 



6 Asymptote behavior of the number of transitions 

AppendixlAlcontains the basics of asymptotic notations. In this section we shall 
establish one of our main result which concern the asymptotically result on 
the behavior of the number of transitions w(t n ), where s(t n ) is the number of 
states. Namely, we will show that the weight of our automata is asymptotically 
equivalent to s(t n )log 2 s(t n ) up to constant which means that the number of 
transitions is minimal in the sense that we reach the lower bounded of Shnitger 
fl3l . Indeed, we have 

Theorem 4. log 2 (4) w(t n ) ~ s(t n ) log 2 s(t„). 
As a consequence we obtain the following 
Corollary 2. For a large n, we have 

w(i„) < s(t n ) log 2 s(t n ). 

It is also easy to deduce the following 

~ „ , s(t n ) log 2 s(t n ) , . 

Corollary 3. -j — : ——= o(w(t n )) . 

log log s(t n ) 

Before starting the proof of Theorem[4] Observe that according to the Corollary 
[T]combined with Proposition [7J one may consider that ui(t n ) and s(t n ) are given 
by 

n— 1 n — 2 

s(t n ) = l+y Sj and U)(t n ) = 2J iSi+i- 
l i=i 

As usual in the number theory, any arithmetical function / : Z — > R can be 
extended to the real line by putting, for any x £ M, f(x) = f([x\). Therefore, 
for any x > 2, we have 

H-l \_x\-2 

s{t x ) = 1 + 2_j ^ an< ^ = ^i+i- 
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We recall that the classical arithmetical function ir(x) denote the number of 
primes not exceeding x. We shall need also the following classical identity due 
to Abel 

Theorem 5 (Abel's identity, 1 1'|). For any arithmetical function a(n) let 

A{x) = a(n) 

where A(x) = ifx < 1. Assume f has a continuous derivative on the interval [y, x], 
where < y < x. Then we have 

/X 
A(u)f{u)du.. (1) 

We deduce easily from the Abel's identity the following lemma 
Lemma 4. For any integer n > 1, we have 

/n-2 
s(t u )du, 

where s(t u ) = s(t u ) — 1. 

We need to estimate ui(t n ) with respect to s(t n ). For that, we shall need the 
following weaker form of the Prime Number Theorem (WPNT for short) due 
to Cheyshev 

Theorem 6 (111). For every integer n>2we have 

1 n , , ii 

T^r^ <tt n <6-— -. 2 

6 log(n) log(rt) 

We deduce from the WPNT the following crucial proposition 
Proposition 8. For allu> 4 we have 

~, x 1 u ^ 2 

Proof. We can show that for any prime number p > 2, we have S p — 2. Hence 
Consequently, by WPNT we get 

1 [u] - 1 
S[tu} - 31og([u]-l)- 
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But the function x G [3, +00 [i-> log x ^ is increasing function. It follows that we 
have, For all u > 3, 

~, 1 it -2 

S(<m) " 31og(n-2)' 
Which achieve the proof of the proposition. 
For any x > 2 and any positive integer n, let 

Li„(x) 



log n W 

'2 

Let us summarize in the following proposition a classical well-known results 
on Li n (x) that we shall used. 

Proposition 9. For every x > 2 and integer n>l,we have 

x 2 
Lii (x) = + Li 2 (x) - — — , (4) 

log(x) log(2) 

log 

Now, we are able to formulate our key estimation of uj(t n ) with respect to 
s(t n ) in the following proposition. 

, U)(t n ) s 

Proposition 10. limsup —r- — -) < 1. 

[Tl Z)syt n ) 

Proof. Applying Lemma|4]we have, for any n > 3, 

•n-2 

u(tn) _ 1 3 1 

(n - 2)s(i„) ~ (n - 2)?(t n ) (n - 2)s(t„) 

Therefore, for any n > 5, we have 

u>(t n 



s(t u )du. 



< 1 



(n - 2)a(i n ) " (n - 2)s(t n ) 
From Proposition [8] we deduce, that for any n > 5 we have 



3 1 

< 



(n - 2)s(t„) " log(n - 2) 
Hence, by letting n goes to 00 we obtain 

3 



(n — 2)s(t n ) n-too 

Which implies that 



> 0. 



r t w ( <n ) \ / 1 
limsup — — — - < 1, 



and this finish the proof of the proposition. 
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We are reduced to compare the sequences (n — 2)s(t n ) and s(t n ) log 2 s(i n ). For 
that we shall estimate s (t x ) . Precisely, we argue that we have the following 

Theorem 7. For a large x > Owe have 



i^- 1 3, , 4^" X 
a; 4 ;=— < s{t x ) < x 4 - iog(x) 



V 71 " V 71 " 
The proof of Theorem[7]will be given later. For instance, using Theorem[7|holds 
we shall extended Proposition [10] as follows. 

Proposition 11. The sequences Lo{t n ) and (n — 2)s(t n ) two sequences be equivalent. 
That is, 

oj(t n ) 



(n — 2)s(t n ) n->oo 



> 1. 



Proof. By Lemma SI write 

(n-2)a(* n ) ~ 



»n— 2 



(n - 2)s(t„) (n - 2)s(i n ) 



s(t x )dx. 



Let e > and x sufficiently large. Then, by Theorem[71 for a large x, we have 



x 4 — ;=— < s^J < a; 4 log(x) — ^— . 



'7T 



'7T 



(6) 



But 











A^dx U= ^ 


/ 2 








\ 2U 4«~ 


\/n-2 


2 4« 




Llog(4) - 


V2 


-log 2 (4) J 



n/2 



2 v / ^2 ^^"^ 2^ .4^ 2 .4^=2 2 _ 4 V2 
iog(4) 



(7) 

log(4) log 2 (4) log 2 (4)' W 

Since ^ n _^j s ^ t \ vanishes at the infinity and s(t n ) is equivalent to s(t n ) we may 
assume that © holds starting from 2 and Theorem[7|is valid for s(t n ). Therefore 



*n-2 



»ra-2 



(n-2)?(t„) 



s(t x )dx < 



(n- 2)(n- 2) 4 4V^ 



x 4 log(x) 4 v ^dx 



< 



(n- 2)i log(n-2) 
(n-2)(n-2)f 4v^ 



4^cfa 



< 



log(n - 2) 
(n- 2)4V^ 



4^cfa 



(8) 
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From combined with <[Sj> it follows that 

/n-2 
s(t x )d X < 

2^/n=2 l g(n - 2) 2 .4^ bg(n - 2) 

log(4) (n-2)4v^ log 2 (4) (ra - 2)4V" 

We conclude that 

w(t„) 

> 1. 



* 0. 



(n — 2)s(t n ) n-+oo 

which proves the proposition. 

It remains to prove Theorem [7] For that we shall need the following classical 
lemma. The proof of it can be found in [4J. Nevertheless we include the proof 
for the sake of completeness. 



Lemma 5. 



2n 
n 



Proof. By Stirling formula we have 

n! ~ n n e- n V2^n 

Hence 

(2n\ 2nl 4 r ' 



\ n J (n!) 2 
This finishes the proof of the lemma. 

Proof (Proof of Theorem^}. For x > 2, Write 

n<x d\n 

q + d-2 
d-l 



E 

dq<x 

L*J LfJ 

EE 

d=l g=l 



q + d-2 
d-l 



From this we see that 



d=l L J 



< a; log(x)( 2( ^ J J _ 1 1) ), (9) 
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and 



> L*J( 2( L ^ J _ i^). (io) 



Using the relation [x\ = x + 0(1) combined with 10 and JlOl l, we obtain 



.r 



LacJ - 1 ; ~ v ^ ~ -""bv-'^ - 1 
By Lemma |5l this gives 

x 4 — =- < s(t x ) < x± \og(x) — — , 
V 71 " V 71 " 

which proves the theorem. 

Now we are able to give the proof of Theorem[4] 

Proof (of Theorem®. By Proposition [TTJ it is sufficient to show that 

s(t n ) log 2 (s(i„)) ~ (n-2)s(t n ). 
For that, observe that we have 

S{t n ) log 2 ( S (t n )) = log 2 ( 3 {t n )) 

(n - 2)s(t n ) n-2 
Applying Theorem[3l we deduce that 

log(s(t„)) ~ log(4) Vn. 



Whence 
Hence 

We deduce that 



log 2 ( S (t„))~log 2 (4) n. 

V(£M^ log2(4) . 

(71 — 2) n->oo 
Uj(t n ) 1 



s(^)log 2 s(i„) log 2 (4) 



< 1. 



This finishes the proof of the theorem. 



7 Conclusion 

In this paper we show how binary trees can be used to design a fast algorithm 
for computing an automaton with a reduced^] number of transitions recogniz- 
ing the language L(E n ). We have verify that our algorithm gives the minimal 
number of transitions for n = 1 to 7 (see Table [1} and we have shown that our 
reduction is asymptotically a minimization. Hence, we conjecture that Algo- 
rithm [3] computes the minimal transition automaton. 



5 Asymptotically minimal. 
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A Asymptotic notations 

Following [4J, we employ the standard asymptotic notation called Bachmann- 
Landau notation as follows. Let § be a set and sq e § a particular element 
of §. We assume a notion of neighbourhood to exist on S. Examples are S = 
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Z >0 U{+°°} with s = +oo, § — M. with s any point in R; § = C or a subset 
of C with s = 0, and so on. Two functions / and g from § \ {s } to K or C are 
given. 

- Onotation: write 

f(s) 0(g(s)), 

if the ratio stays bounded as s -> s in S. In other words, there exists 

9{s) 

a neighborhood V of s and a constant C > such that 

<C\g(s)\, sGV,s^s . 

One also says that "/ is of order at most g" , or "/ is big-Oh of g"(as s tends 
to s ). 

- o-notation: write 

f(s) ^ o(g(s)), 

f( 3 ) 

if the ratio — -i- tends to as s -> s in §. In other words, for any (arbitrarily 

9W 

small) e > 0, there exists a neighborhood V £ of s (depending on e), such 
that 

\f(s)\ < e\g(s)\, sgV £ ,s^s . 

One also says that "/ is of order smaller than g, or / is little-oh of g" (as s 
tends to s ). 

- ^-notation: write 

/(*) S - S ° 9(s), 

if the ratio — — tends to 1 as s -> s in S. One also says that " f and g are 

asymptotically equivalent" (as s tends to s ). 

- i7-notation: write 

/(s) S ^ S0 n(g(s)), 

f(s) 

if the ratio ■ stays bounded from below in modulus by a non-zero quan- 

g(s) 

tity as s — >• s in S. Which means that there exists k > and a neighborhood 
V of s , such that 

f(s)>k.g(s), seV. 

One then says that / is of order at least g. 

- 6>-notation: if f(s) = 0(g(s)) and f(s) = Q(g(s)), write 

f(s) s ^ s ° e(g(s)). 

This implies that there exits k, C > and a neighborhood V of s , such that 

k.g(s) < f(s) < C.g(s), s e V. 
One then says that / is of order exactly g. 
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At this point we are able to make a parallel between the history of our con- 
tribution and the history of the famous Prime Number Theorem (PNT) which 
we shall use later in its weaker form. The PNT Theorem concerns the asymp- 
totic behavior of the prime-counting function ir(x) = \{p < x,p prime} |. Using 
asymptotic notation the PNT can be restated as 



The behavior of n(x) has been the object of intense study by many celebrated 
mathematicians ever since the eighteenth centry. Inspection of tables of primes 
led Gauss (1792) and Legendre (1798) to conjecture the PNT. In 1808 Legendre 
published the formula ir(x) = x/{logx + A(x)), where A(x) tends to a constant 
B = —1.08366 as x — > +oo, which means that ir is Q(x/log(x)). 
According to Bateman and Diamond \12\, The first person to establish the true 
order of ir(x) was P. L. Chebyshev. Indeed, in two papers from 1848 and 1850, 
Chebychev prove that ir(x) is 9(x/log(x). This result is known in nowadays as 
Chebychev Theorem. 

Finally, in 1896 the PNT was first proved by Hadamard and de la Vallee Poussin. 
Their proofs were long and intricate. A simplified modern presentation is given 
on pages 41-47 of Titchmarsh's book on the Riemann Zeta function [14J. 



